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Abstract 



Top-down and bottom-up theorem proving approaches each have specific advantages 
and disadvantages. Bottom-up provers profit from strong redundancy control but sufli"er 
from the lack of goal-orientation, whereas top-down provers are goal-oriented but often have 
weak calculi when their proof lengths are considered. In order to integrate both approaches, 
we try to achieve cooperation between a top-down and a bottom-up prover in two different 
ways: The first technique aims at supporting a bottom-up with a top-down prover. A top- 
down prover generates subgoal clauses, they are then processed by a bottom-up prover. The 
second technique deals with the use of bottom-up generated lemmas in a top-down prover. 
We apply our concept to the areas of model elimination and superposition. We discuss 
the ability of our techniques to shorten proofs as well as to reorder the search space in an 
appropriate manner. Furthermore, in order to identify subgoal clauses and lemmas which 
are actually relevant for the proof task, we develop methods for a relevancy-based filtering. 
Experiments with the provers Setheo and Spass performed in the problem library TPTP 
reveal the high potential of our cooperation approaches. 

1. Introduction 

Automated deduction is — at its lowest level — a search problem that spans huge search 
spaces. In the past many different calculi have been developed in order to cope with 
problems from the area of automated theorem proving. Essentially, for first-order theorem 
proving two main paradigms for calculi are in use: Top-down calculi like model elimination 
(ME, Loveland, 1968, 1978) attempt to recursively break down and transform a goal into 
subgoals that can finally be proven immediately with the axioms or with assumptions made 
during the proof. Bottom-up calculi like superposition (e.g., Bachmair & Ganzinger, 1994) 
go the other way by producing logic consequences from the input set until an obvious 
inconsistency is derived. 

When comparing results of various provers (e.g., SutclifFe & Suttner, 1997) it is obvious 
that provers based on different paradigms often behave quite differently. There are problems 
where bottom-up theorem provers perform considerably well, but top-down provers poorly, 
and vice versa. The main reason for this is that bottom-up provers often suffer from the 
lack of goal-orientation of their search, but profit from their strong redundancy control 
mechanisms. In contrast, top-down provers profit from their goal-orientation but suffer 
from insufficient redundancy control. This entails long proof lengths for many problems 
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(e.g., Letz et al., 1994). Therefore, a topic that has come into the focus of research is the 
integration of both approaches. Specifically, cooperation between theorem provers (e.g., 
Conry et al., 1990; Schumann, 1994; Denzinger, 1995; Bonacina & Hsiang, 1995; Bonacina, 
1996; Wolf & Fuchs, 1997; Fuchs, 1998b, 1998c) based on top-down and bottom-up principles 
appears to be promising because by exchanging information each approach can profit from 
the other. It is also possible to modify calculi or provers which work according to one 
paradigm so as to introduce aspects of the other paradigm into it. This, however, requires a 
lot of implementational effort to modify the provers, whereas our approach does not require 
changes of the provers but only changes of their input. We can hence employ arbitrary 
state-of-the-art provers. 

Information that is well-suited for improving the performance of top-down provers are 
lemmas deduced by bottom-up provers. These lemmas are added to the input of a top- 
down prover and can help to shorten the proof length by immediately solving subgoals. 
Normally, the employed proof procedures can significantly profit from the proof length 
reduction obtained. The use of lemmas, however, also imports additional redundancy into 
the calculus. This means that an unbounded use of bottom-up generated lemmas without 
using techniques for choosing only relevant lemmas (i.e. lemmas which lead to a reduction 
of the search effort to enumerate a proof) is not sensible. In this article, in contrast to other 
approaches which generate lemmas dynamically during the proof run (Astrachan & Stickel, 
1992; Astrachan & Loveland, 1997), we want to employ a bottom-up prover for generating 
lemmas in a preprocessing phase. After the generation of a pool of lemma candidates 
relevant lemmas are selected from this pool and the formula to be refuted is augmented by 
these bottom-up generated formulas. 

The second main aspect that we consider is top-down/bottom-up integration by trans- 
ferring information from a top-down prover to a bottom-up prover (e.g., Fuchs, 1998a). 
Our approach is to transfer top-down generated subgoal clauses — which essentially repre- 
sent a transformation of an original goal clause into subgoals — to a bottom-up prover and 
to augment its input by these clauses. This introduces a goal-oriented component into a 
bottom-up prover which can enable it to solve proof problems considerably faster. How- 
ever, as is the case with lemmas, an unbounded transfer of subgoal clauses is not sensible. 
Thus, we generate again subgoal clauses in a preprocessing phase and integrate only some 
of these clauses into the input set of a bottom-up prover. This necessitates techniques for 
selecting relevant subgoal clauses, i.e. techniques for selecting a set of subgoal clauses which 
can decrease the search effort of a bottom-up prover in order to find a proof. 

In order to examine this kind of top-down/bottom-up integration we restrict ourselves 
to the bottom-up superposition calculus and the top-down connection tableau calculus. 
These calculi are very important since they are the basis for many high-performance the- 
orem provers. For instance, the bottom-up provers Spass (Weidenbach et al., 1996) and 
Gandalf (Tammet, 1997) that were most successful in recent proving competitions employ 
superposition and ordered paramodulation, respectively. The connection tableau calculus 
(or its restriction model elimination) is also the basis for very successful top-down provers, 
e.g., Setheo (Moser et al., 1997) or METEOR (Astrachan k Loveland, 1991). In our 
opinion the concepts introduced for superposition and the connection tableau calculus can 
rather easily be transferred to other bottom-up and top-down calculi. Hence, the choice of 
these two calculi surely is justified. 
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The article is organized as follows. We start with a brief overview of superposition and 
model elimination (Section 2). Moreover, we discuss strengths and weaknesses of the calculi 
in more detail and introduce our approach for combining the strengths of both calculi. In 
Section 3 we address effects of the integration of ME subgoal clauses into the search state of 
a superposition-based prover. Furthermore, we describe two variants of a relevancy-based 
filtering of subgoal clauses. Section 4 deals with the use of bottom-up generated lemmas. We 
discuss in detail the ability of the produced clauses in order to help decrease proof lengths 
for refuting a given set of clauses as well as to reorder the search space in an appropriate 
manner. Based on this discussion, we introduce several relevancy measures. In Section 5, 
an experimental study conducted with the high-performance theorem provers Setheo and 
Spass reveals the potential of our techniques. We have chosen these systems in order to 
show that our concept can easily be integrated into existing systems and is even able to 
improve on the performance of very powerful provers. Finally, in Section 6 an overview of 
related approaches for top-down/bottom-up integration concludes the article. 

2. A Framework for Coupling Top-Down and Bottom-Up Provers 

In the following, we introduce typical representatives of top-down and bottom-up calculi 
and discuss their strengths and weaknesses in detail. After that, we sketch the basics of our 
methodology in order to combine these calculi. 

2.1 Automated Theorem Proving with Superposition and Model Elimination 

The general problem in first-order theorem proving is to show the inconsistency of a set C 
of clauses. A clause is a set of literals. As already discussed, theorem provers usually utilize 
either top-down or bottom-up calculi for accomplishing this task. 

Typically, a bottom-up calculus contains several inference rules which can be applied 
to a set of clauses that constitute the search state. Generally, the inference rules can be 
divided into two classes: Expansion inference rules permit the generation of new clauses 
and contraction inference rules delete clauses or replace them by others. The most popular 
bottom-up calculus is the resolution calculus (Robinson, 1965). There, the expansion rules 
are resolution and factoring. The resolution calculus can be extended with contraction 
rules, e.g. the deletion of tautologies. If equality is involved in a problem it is sensible to 
employ the superposition calculus (Bachmair & Ganzinger, 1994), which extends resolution 
with specific rules suitable for handling equations. The expansion rules of the superposition 
calculus are superposition, equality resolution, and equality factoring. Again, additional 
contraction rules such as tautology deletion, subsumption, condensing, and rewriting can 
be employed. It is to be emphasized that for our study we employ the version of the 
superposition calculus introduced by Bachmair and Ganzinger (1994). Specifically this 
entails that factoring is only applied to positive literals. 

A bottom-up theorem prover usually maintains a set of so-called potential or passive 
clauses from which it selects and removes one clause C at a time. This clause is put into 
the set of activated clauses. Activated clauses are, unlike potential clauses, allowed to 
produce new clauses via the application of some inference rules. The inferred new clauses 
are put into . Initially, = and = C. The indeterministic selection or activation 
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step is realized by heuristic means. To this end, a heuristic 7i associates a natural number 
ioc G IN with each C G J-^, and the C G J-^ with the smallest weight ujc is selected. 
An important property of heuristics is their fairness. A heuristic is called fair if it selects 
potential clauses in such a manner that no clause remains passive infinitely long. Usually 
the fairness of the used heuristic implies that the prover is complete, i.e. it can derive the 
empty clause when obtaining an inconsistent input set (provided the underlying calculus is 
complete). 

The main strength of bottom-up calculi and provers is their strong redundancy control. 
On the one hand, many inferences which are definitely unnecessary in a proof, e.g. infer- 
ences involving tautologies, are omitted. On the other hand, contraction inference rules like 
subsumption avoid the repetition of expansion inferences involving the same (or more in- 
stantiated) clauses. A big disadvantage of bottom-up calculi is their lack of goal-orientation. 
Because certain inferences are favored over others due to the fixed search strategy and the 
heuristic weight of the clauses part of it, it might be the case that for a very long time only 
clauses which are not part of any proof are enumerated. 

Model elimination is a typical top-down calculus which we shall introduce in the form 
of the connection tableau calculus (CTC) (Letz et al., 1994). In order to introduce CTC we 
want to start with the basic (free variable) tableau calculus (e.g.. Fitting, 1996) for clauses. 
A tableau T for C is a tree whose non-root nodes are labeled with literals and that fulfills 
the condition: If the immediate successor nodes of a node u of T are labeled 

with literals /i, . . . , /„, then the clause {/i, . . . , /„} [tableau clause) is an instance of a clause 
in C. In the tableau calculus two inference rules are used, namely the expansion and the 
reduction rule (e.g.. Fitting, 1996). An application of the expansion rule means selecting a 
clause from C and attaching the literals of a variant of it to a subgoal s which is a literal 
at the leaf of an open branch (a branch that does not contain two complementary literals). 
Tableau reduction closes a branch by unifying a subgoal s with the complement of a literal 
r (denoted by ~ r) on the same branch, and applying the substitution to the whole tableau. 

Connection tableau calculi work on connected tableaux. A tableau is called connected 
or a connection tableau if each inner node v (non-leaf node) which is labeled with literal 
/ has a leaf node v' among its immediate successor nodes that is labeled with a literal /' 
complementary to /. The inference rules are start, extension, and reduction. The start rule is 
always the first inference step of a derivation. It permits a tableau expansion that can only 
be applied to a trivial tableau, i.e. one consisting of only one node. Note that the start rule 
can be restricted to so-called start relevant clauses without causing incompleteness. Start 
relevancy of a clause is defined as follows. If C is an unsatisfiable set of clauses, we call 
S £ C start relevant if there is a satisfiable subset C C C such that C'U {S} is unsatisfiable. 
Since the set of negative clauses contains at least one start relevant clause, we also consider 
a restricted calculus which only employs negative clauses for the start expansion [CTCneg]- 
The reduction rule is the same as in the conventional tableau calculus. Extension is a 
combination of expansion and reduction. It is performed by selecting a subgoal s in the 
tableau T , applying an expansion step to s, and immediately performing a reduction step 
with 8 and one of its newly created successors. Note that in the area of Horn clauses it 
is sufficient to employ start and extension, i.e. the reduction inference is unnecessary (e.g., 
Antoniou & Langetepe, 1994). Thus, we assume that we use versions of CTC or CTCneg 
that do not employ reduction steps in the area of Horn clauses. 
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CTC or CTC'neg do not have specific inference rules for liandling equality. Instead, when 
dealing with equality, the axiomatization must be extended by the reflexivity, symmetry, 
transitivity, and substitution axioms of the equality symbol. Indeed, the use of an axiomatic 
form of equality is in no sense optimal. But it is very difficult to develop methods for using 
built-in equality in tableau calculi that yield convincing results in practice. 

If a subgoal s becomes (after some inferences) head of a closed (sub-) tableau we call the 
obtained substitution a solution of s. 

The notion of a tableau derivation and a search tree is important: We say T h T' if (and 
only if) tableau T' can be derived from T by applying a start rule if T is the trivial tableau, 
or by an extension/reduction rule to a subgoal in T. The connection tableau calculus is not 
(proof) confluent. In order to show the unsatisfiability of a clause set C, a search tree, given 
as follows, has to be examined in a fair way (each node of the tree must be visited after a 
finite amount of time) until a closed tableau occurs. A search tree T defined by a calculus 
and a set of clauses C is given by a tree whose root is labeled with the trivial tableau. Every 
inner node in T labeled with tableau T has as immediate successors the nodes from the 
maximal set {vi, . . . , where Vi is labeled with Ti and T \- Ti, 1 < i < n. 

Since not only the number of proof objects but also their size increases during the proof 
process, explicit tableaux enumeration procedures that construct all tableaux in T in a 
breadth-first manner are not reasonable. Hence, implicit enumeration procedures that apply 
consecutively bounded iterative deepening search with backtracking (Korf, 1985) are normally 
used. In this approach iteratively larger finite initial parts of the search tree T are explored 
with depth-first search. A finite segment is normally defined by a so-called completeness 
bound (which poses structural restrictions on the tableaux which are allowed in the current 
segment, see below) and a fixed natural number, a so-called resource. Iterative deepening 
is performed by starting with a basic resource value ra G IN and iteratively increasing n 
until a proof is found within the finite initial segment of T defined by one bound and n. 
Prominent examples for completeness bounds are the depth bound, inference bound, and 
weighted-depth bound. 

The depth bound limits the maximal depth of inner nodes (non-leaf nodes) in a tableau 
where the current resource n is the maximal depth permitted (the root node has depth 0). 
In practice, the depth bound is quite successful (Letz et al., 1994; Harrison, 1996) but it 
suffers from the large increase of the segment (defined by a resource n) caused by an increase 
of n. The inference bound allows a level by level exploration of the search tree (e.g., Stickel, 
1988). In comparison with the depth bound, the inference bound makes a smooth increase 
of the search space possible, but it is inferior to the depth bound in practice. In order to 
combine the advantages of the depth and the inference bound, the weighted-depth bound 
was introduced by Moser et al. (1997). This bound describes a class of possible bounds that 
restrict the tableau depth and the number of inferences allowed to infer a specific tableau. 

Goal-orientation of CTC, as our typical top-down calculus, is given by the connectedness 
condition. This condition entails that every literal in a tableau bears a relation to the start 
clause. The set of possible start clauses can normally be restricted to quite a small set of 
clauses which are sufficient in order to guarantee completeness (e.g., Moser et al., 1997), 
e.g. the set of negative clauses as already mentioned. Thus, only certain descendants having 
a connection to a start (goal) clause are enumerated. A main problem of proofs with CTC 
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Figure 1: Cooperation between a top-down and a bottom-up prover 

is that in general they are rather long. In fact, CTC is among the weakest calculi when the 
lengths of existing proofs are considered. Therefore, Letz et al. (1994) proposed extensions 
of CTC which are based on a controlled integration of the cut rule. These extensions can 
also be seen as restricted lemma mechanisms. A further problem is that often during the 
search tableaux with the same or subsumed subgoals occur repeatedly. There are some 
extensions of the calculus proposed to overcome such problems. E.g., by Letz et al. (1994) 
a restricted subsumption concept and by Astrachan and Stickel (1992) caching techniques 
have been introduced. 

2.2 Achieving Cooperation by Preprocessing and Input Augmentation 

Our approach of integrating top-down and bottom-up provers by cooperation is charac- 
terized by preprocessing and input augmentation. Henceforth, let C be the initial clause 
set whose inconsistency should be shown. In the preprocessing phase the bottom-up su- 
perposition prover generates a set of clauses Cbu such that C \= Cbu- Analogously, we 
extract from a certain number of proof attempts of the ME prover a clause set Ctd such 
that C \= Ctd- Then, the input set C of the superposition prover is augmented by Ctd, 
and the input of the ME prover with Cbu- After that both provers can proceed to work in 
parallel. Figure 1 displays this kind of cooperation that is essentially based on a sequential 
concatenation of both provers. The approach can be seen as a specific instantiation of the 
general cooperation approach TECHS (Denzinger & Fuchs, 1998). 

Since the superposition prover works on a search state which contains a set of clauses, 
it is very easy to generate a set of valid clauses in a preprocessing phase. A very simple 
method is to perform a fixed number i of activation steps and to generate the clause sets !F^'^ 
and J^^'* of active and passive clauses. Then, we select all facts, i.e. positive unit clauses, 
from !F^'\ Since the inferences of a superposition-based prover are sound, it produces 
only logic consequences of C. As we shall explain in Section 4 in more detail, it is not 
sensible to add all generated positive units to the input of the top-down prover, i.e. to set 
Cbu = {C : C is a fact,C G J-^'^}. However, it is wise to select only some units with a 
function (fBUi i-e. Cbu = fBuiiC : C is a fact, C G J^^'^}). 

Because connection tableau-based provers have a search state which contains deductions 
(tableaux) instead of clauses, it is at first sight not obvious how to extract valid clauses 
from such a search state which will be well-suited for a superposition-based prover. A 
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common method in order to extract valid clauses is to employ lemma mechanisms of ME 
provers. Assume that a literal s is a label of the root node of a closed subtableau T*. Let 
li, . . . ,ln be the literals that are used in reduction steps for closing and that are outside 
of r*. Then, the clause {~ s, ~ /i, . . . , ~ /„} may be derived as a new lemma (since it is a 
logical consequence of the tableau clauses in T*). Such a lemma could be transferred to a 
bottom-up prover. As appealing as this idea sounds, it has some severe restrictions: Such 
lemmas usually are, e.g. due to instantiations which were previously needed to close other 
branches, not as general as they could be. Hence, often they cannot be used in inferences, 
and especially not in contracting inferences which are very important for bottom-up provers. 
Since these clauses are generated during the proof run in a rather unsystematic way they 
do not really introduce much goal-orientation and hence do not make use of the advantages 
of the search scheme typical for ME. 

The concept of subgoal clauses permits the generation of clauses derived by inferences 
involving a proof goal: 

Definition 2.1 (subgoal clause, subgoal clause set) 

1. Let C be a set of clauses, let T be a tableau for C. A subgoal clause St regarding T 
is the clause St = {h, ■ ■ ■ , In}, where the literals li are the subgoals of the tableau T. 

2. Let _B be a bound, ra G IN be a resource, and C be a set of clauses. 

If CTC is used, the subgoal clause set S^'"^'^ w.r.t. B, ra, and C, is defined by S^'"^'^ = 
{St : r is a tableau in the initial segment of the search tree for C and CTC that is 
defined by B and ra} \ C. 

If CTC'neg IS in usc, the subgoal clause set S^^g'^ is the set S^^g'^ = {St '■ St G gB,n,c ^ 
the start expansion of T is performed with a negative clause}. 

Note that subgoal clauses are valid clauses, i.e. logical consequences of the initial clause 
set. The following example illustrates our notion of subgoal clauses. 

Example 2.1 Let C = {{^^r}, {^pi, . . . , £(}, {^gi, . . . , £(}}. Then, {^pi, . . . , ^p„} 
is the subgoal clause St belonging to the tableau obtained when extending the goal -^g with 
the clause {^pi, ■ ■ .,^Pn,g}- If we employ B = inference bound (Inf) and resource k = 2, 
then S^''^'^ = S^^p^ = {{-pi, . . . , -p4, {-gi, . . . , -g„}}. 

A subgoal clause St represents a transformation of an original goal clause (which is the 
start clause of the tableau T) into new subgoals realized by the deduction which led to 
the tableau T. The set S^"^'^'^ is the set of all possible goal transformations into subgoal 
clauses within k inferences if we consider all input clauses to be goal clauses, the set S^"p^'^ 
is the set of all possible goal transformations into subgoal clauses within k inferences if we 
only consider the negative clauses to be goal clauses. More exactly, S^"^'^'^ is the closure 
of all (goal) clauses w.r.t. (a fixed number k of) extension and reduction steps, S^"p^'^ is 
the closure of all negative (goal) clauses w.r.t. k extension and reduction steps. 

In order to couple an ME and a superposition prover, we generate in the preprocessing 
phase with the inference bound and a fixed resource k > I either the set S^"^'^'^ or the set 
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S^"p^'^, depending on whether CTC or C'TC'neg is used. It is not sensible to set Ctd = 
^lnf,k,c _ ^lnf,k,c augment the input of the bottom-up prover by this set 

(see Section 3). Thus, we use again a filter function i^td for selecting some subgoal clauses. 
That is, Ctd = ipTD{S'"f'''^) C S'^f''^'^ or Ctd = ^TDiSH/^^) C S^/^^ . 

Finally, we want to explain how our method — preprocessing and augmentation of the 
input of the provers — is indeed well-suited for overcoming the disadvantages of the provers. 

We start with the top-down prover. Usually the clauses Cbu generated by a superposi- 
tion prover are quite general because specialized clauses are eliminated by subsumption or 
rewriting. It is hence quite probable that they can often be used for closing open branches 
of tableaux enumerated by a top-down prover without instantiating the tableaux. If such 
a kind of lemma matching (e.g., Iwanuma, 1997) is possible we are able to close branches 
without introducing new subgoals or reducing the possibility that the remaining subgoals 
are solvable. If subgoals which often occur in tableaux can be solved, the lemmas are a 
good means for redundancy control. Moreover, since the search scheme of a superposition 
prover differs from that of an ME prover it is to be expected that it can generate clauses 
with few inferences that can close a branch which could only be closed by many inferences 
when using no lemmas. Then proof lengths are drastically reduced. See Section 4 for a 
more detailed description of the use of lemmas. 

The input of a superposition prover is augmented by subgoal clauses which are the result 
of a transformation of an original goal clause into subgoals. Hence, the goal-orientation of 
a superposition prover is increased if it uses such transformed goal clauses in its inferences. 
Since the ME prover employs a different search scheme it can very quickly conduct certain 
steps that the superposition prover cannot reconstruct because of its heuristic search. The 
search for a proof can then be reduced. 

3. Subgoal Clauses for Top-Down/Bottom-Up Integration 

In this section we examine the integration of subgoal clauses into the input set of a su- 
perposition prover. At first we assume that all subgoal clauses generated within a certain 
number of inferences are added to the input set and we give some results regarding proof 
length and search reduction. After that, we explain the necessity of selecting only some 
subgoal clauses and introduce two selection methods based on the theoretic discussion. 

3.1 Reduction of Proof Length and Search through Subgoal Clauses 

The cooperation method introduced in the preceding section gives rise to the question of 
whether a proof length reduction is possible, i.e. whether there are shorter superposition 
proofs of the inconsistency of C U S^"-^'^'^ or C U S^"p^'^ than of the inconsistency of C. 
(Note that we measure the length of a proof P by counting the number of inference steps 
|P| in it.) Since some inference steps are necessary for enumerating subgoal clauses we 
should try to find out whether these inferences can be saved when using the clauses. 

This question is mainly of theoretical interest because in general bottom-up provers 
do not enumerate minimal proofs. Moreover, bottom-up provers usually perform a lot 
of unnecessary inferences. It is more important to analyze whether a bottom-up prover 
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can benefit from the use of subgoal clauses in tlie form of a proof search reduction, i.e., 
a reduction of tlie number of inferences tlie prover needs in order to find a proof. It is 
particularly interesting to identify the cases of maximal proof search reduction. 

3.1.1 Subgoal clause generation via CTC 

Firstly, we examine the case where we employ calculus CTC, i.e. generate S^"-^'^'^. We 
assume further that no equality is involved in the problem, i.e. superposition corresponds 
to (ordered) resolution. 

Theorem 3.1 

1. Let C be a set of ground clauses not containing equality, let O ^ C, and let k > 1 be 
a natural number. Let Pi and P2 be minimal length resolution refutation proofs for C 
and C Li S^"-^'^'^ , respectively. Then, it holds: \Pi\ > |P2|- 

2. For each k > 1 there is a set of (non-ground) clauses Ck not containing equality 
(U ^ Ck), such that no minimal length resolution refutation proof for Ck U S^"'-^'^'^'' is 
shorter than a minimal length resolution refutation proof for Ck. 

Proof: 

1. Note that no factorization steps are needed in the case of ground clauses (recall that 
clauses are sets of literals). Then, the claim is trivial since the result of the first 
resolution step of each minimal proof is an element of S^"-^'^'^. 

2. Let k > 1. Let Ck be defined by Ck = {{^p{xi), . . . ,^p{xk)}, {p{yi), ■ ■ ■ , p{yk)}}- 
Let >= be the ordering used for ordered resolution. Then, a minimal resolution 
refutation proof for Ck requires k — 1 binary factorization steps (resulting in the clause 
{p{yi)}) and k resolution steps. Furthermore, it can easily be recognized that there 
are in S^"'^'^'^'= only clauses which contain at least one positive and one negative literal. 
Thus, none of these clauses can lead to a refutation proof for CkUS^"'^'^'^'' in less than 
2k — 1 inferences. □ 

Hence, a reduction of the proof length is at least possible in the ground case. The 
(heuristic) proof search of a resolution-based prover may not profit from the proof length 
reduction obtained. For example it is possible that all clauses of a minimal refutation proof 
of C have smaller heuristic weights than the clauses from S^"-^'^'^ and will hence be activated 
before them. Consider following example: 

Example 3.1 Let >= be the ordering used for ordered resolution. Let the clause set C be 
given by C = {{"■«, ~^b, c}, {^g, b}, {a}, {g}, {^c}}. The heuristic 7i corresponds to the FIFO 
heuristic. Furthermore, for the first n activation steps (ra G IN, n > 9), resolvents of the two 
most recently activated clauses are preferred by 7i. Then, following clauses are activated 
by the prover (in this order): {^a, ^5, c}, {^g, b}, {^a, ^g, c}, {a}, {^g, c}, {g}, {c}, {^c}, □. 
Furthermore, if the subgoal clauses of S^"^'^'^ are inserted behind the original axioms the 
prover will find the same resolution refutation proof for C U S^"-^'^'^ {k > 0) and the proof 
search does not benefit from a possible proof length reduction. 
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Since the above example (especially the chosen heuristic) is somewhat contrived, it can 
be expected for many problems that clauses from S^"^'^'^ will be activated. In this case 
there is also no guarantee that the proof search is improved because the subgoal clauses can 
import additional redundancy. 

CTC differs — apart from the lack of factorization — mainly in the handling of equality 
from the superposition calculus. In the case that equality is involved in the problem, a 
proof length reduction even for ground clauses is not guaranteed. 

Theorem 3.2 For each resource k > 1 there is a set of ground unit equations C (O ^ 
C ) where the minimal superposition refutation proofs for C U S^"^'^'^ are not shorter than 
minimal proofs for C . 

Proof: Let >= be the ordering used for superposition. Consider the set of unit equations 
C = {{a = 5},{/'^~^(a) / /'^~^(^)}}- We assume that >= is used as an ordering for 
superposition. Then, a minimal superposition refutation proof for C requires two inferences, 
a superposition step into f^~^{a) / f^~^{b) resulting in the inequation f^~^{a) / f^~^(a), 
and then an equality resolution step. In the set S^"^'^'^ are either non-unit clauses whose 
refutation requires at least 2 inferences or the units h( = {{P{a) / /'(5)}, {/-'(a) = /■'(&)} : 
< i < A; — 1,0 < j < k — 1}. Since the refutation of C L)h( also requires 2 inferences a 
reduction of the proof length is impossible. □ 

In summary a reduction of the heuristic search for a proof cannot be guaranteed be- 
cause the proof length may not be reducible, the subgoal clauses may be ignored by the 
superposition prover, or the subgoal clauses may import too much additional redundancy. 
The two latter points also hold in the ground case without equality where at least a proof 
length reduction is guaranteed. 

The second point is no real problem since — as our experiments showed — usually subgoal 
clauses will be activated and will be involved in the search process of a prover. The risk that 
subgoal clauses are ignored can be minimized by selecting especially such subgoal clauses 
which will take part in the search with a high probability, e.g. clauses with a small heuristic 
weight regarding the heuristic of a superposition prover. In our selection process we need 
not use such a focus according to the experimental results. The first and third point show 
some theoretical weaknesses but in connection with appropriate relevancy-based selection 
techniques we could observe in practice that a restructuring of the search caused by using 
subgoal clauses allows proofs to be found faster. 

3.1.2 Subgoal clause generation via CTC„eg 

Secondly, we examine the case where we employ the calculus C'TC'neg, i-e. generate S^"p^'^ . 
Then, even for ground clauses not containing equality it is possible that minimal proofs 
cannot be shortened when employing subgoal clauses. 

Theorem 3.3 There is a set of ground clauses C where no minimal length resolution 
refutation proof for C U S^^p"^'^ is shorter than a minimal length resolution refutation proof 
for C. 
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Proof: Consider the following set C of clauses (again we employ > = 



1/4, /e, /t}, 
{^3, -^k}i 



{hi h, 
{h, hi 



'h}, 
^h}, 



{h, 
{hi ~^hi 
{-■/i, h- 



h}, 



Each minimal refutation proof for this set requires 9 resolution steps, e.g.: 



[1, 


ax] 


{hi hi h} 


[10, 


res(l,2)] 


{hi h} 


[2, 


ax] 


{hi hi ~^h} 


[11, 


res(6, 10)] 


{hi ^h} 


[3, 


ax] 


1/3,-/4} 


[12, 


res(3, 10)] 


{hih} 


[4, 


ax] 


{/3,-/6} 


[13, 


res(7, 11)] 


{^hih} 


[5, 


ax] 


{-1/2, -1/4} 


[14, 


res(5, 13)] 


{-h} 


[6, 


ax] 


{hi ~^hi ~^h} 


[15, 


res(4, 12)] 


{h} 


[7, 


ax] 


{^hih} 


[16, 


res(8,9)] 


1/2,-/3} 


[8, 


ax] 


{hi hi ~^h} 


[17, 


res(14, 16)] 


1-/3} 


[9, 


ax] 


{^hihi ^h} 


[18, 


res(15,17)] 


□ 



Now, it holds: 



<rInf,2,C 



{-1/2, hi h}i 
{^hi^hi ^h}i 



{^hi hi -/t}, 
{^hi ~^hi -/e} 



{hi ~^hi ~^h}i 



When enumerating all proofs for C U S^^p"^'^ one can recognize that the minimal proof 
length cannot be reduced. □ 

In the case where equality is involved in our proof problems, we obtain a theorem 
analogous to the previous one: 

Theorem 3.4 For each resource k > 1 there is a set of unit equations C where the minimal 
superposition refutation proofs for C U S^"p^'^ are not shorter than minimal proofs for C. 

Proof: In analogy to Theorem 3.2. □ 
We can recognize that for CTCneg the results are essentially the same as for CTC . In 
general the reduction of the heuristic search for a proof cannot be guaranteed and proof 
lengths may not always be reduced. In practice, however, we could again observe that a 
restructuring of the search caused by subgoal clauses often allows proofs to be found faster. 

In the following we simply assume that inferences with subgoal clauses are not omitted 
by a superposition prover, i.e. that they are involved in the proof search. We want to 
deal in more detail with the problem of identifying subgoal clauses which can lead to a 
large reduction of the search effort and how we can efficiently select such subgoal clauses. 
This problem has to be tackled with heuristics since there is — as we have examined — no 
theoretical guarantee and also no method to decide whether subgoal clauses are useful. 



3.2 Relevancy-Based Generation of Subgoal Clauses 

Even when using small resources k the sets S^"-^'^'^ and S^"p^'^ can become quite large. 
Thus, it is not sensible to integrate all subgoal clauses from S^"^'^'^ or 5^"^''^'*' into the 
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search state of a superposition-based prover. Integrating too many clauses usually does not 
entail a favorable rearrangement of the search because the heuristic "gets lost" in the huge 
number of clauses which can be derived from many subgoal clauses. It is hence reasonable 
to develop techniques for filtering subgoal clauses that appear to entail a large gain in 
efficiency for a superposition prover if they can be proven. That is, we are interested in 
filtering relevant subgoal clauses. As already described in Section 2, we generate a set of 
subgoal clause candidates and then we select some subgoal clauses from this set. The chosen 
subgoal clauses are added to the search state of the bottom-up prover. In the following, 
we will at first introduce some criteria for measuring the relevancy of a clause. Then, we 
shall introduce two techniques for generating subgoal clause candidates and deal with the 
selection of relevant subgoal clauses. 

3.2.1 Relevancy Criteria for Subgoal Clauses 

Two main characteristics of subgoal clauses can contribute to a speed-up of the proof search. 

Firstly, since according to Section 3.1 subgoal clauses introduce additional redundancy 
it is important that some of the clauses can be proven quite easily, that is more easily than 
the original goal(s). In order to estimate this, it is necessary to judge whether they can 
probably be solved with the help of clauses of the input set. Measuring similarity between 
a goal and other clauses with the techniques developed by Denzinger and Fuchs (1994), 
Denzinger et al. (1997), or Fuchs (1997) may be well-suited for this estimation. 

Secondly, a solution of a newly introduced subgoal clause should not always entail a 
solution of an original goal within few steps of the superposition-based prover. If this were 
the case then the integration of new subgoal clauses would not promise much gain. Criteria 
in order to estimate this are: Firstly, the transformation of an original goal clause into a 
subgoal clause by an ME prover should have been performed by using many inferences, i.e. 
k should be quite high. Then, a solution of a new subgoal clause usually does not entail 
a solution of an original goal within few steps because the probability is rather high that 
a bottom-up prover cannot — due to its heuristic search — quickly reconstruct the inferences 
needed to infer the original goal. Secondly, if there is a subgoal clause St and some of 
the tableau clauses of the tableau T have a high heuristic weight w.r.t. the heuristic of the 
superposition-based prover, a high gain of efficiency can be expected if the prover can prove 
St- This is due to the fact that inferences needed to infer the original goal using St are 
difficult for the superposition-based prover. 

3.2.2 Efficient Generation and Selection of Subgoal Clauses 

In order to generate a set of interesting subgoal clauses it is important that we employ a 
large resource for generating subgoal clauses. As we have already discussed, subgoal clauses 
that are generated with a small number of inferences do not promise much gain because 
a bottom-up prover may easily reconstruct the inferences needed to infer them. However, 
it is not possible to generate all subgoal clauses S^"^'^'^ or S^"p^'^ for a sufficiently large 
resource k as subgoal clause candidates because their huge number means that the costs of 
generation and additional selection are too high. Hence, we must restrict ourselves to a set 
of subgoal clause candidates that is a subset of S^"-^'^'^ or S^"p^'^, k sufficiently large (see 
Section 5). 
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Figure 2: Inference-based generation of a set of subgoal clause candidates 

Our first variant, an inference-based method, starts by generating subgoal clauses from 
glnj,k,c ^lnf,k,c ^ rather large resource k and stops when Ngg subgoal clause candidates 
are generated. The advantage of this method is that it is very easy and can be efficiently 
implemented. Tableaux are enumerated with a fixed strategy for selecting subgoals for 
inferences (usually left-most/depth-first) and for each tableau its subgoal clause is stored. 
The main disadvantage of this method is that due to the fixed strategy and the limit of 
the number of subgoal clauses, we only obtain subgoal clauses which are inferred from 
goal clauses by expanding particular subgoals with a large number of inferences and other 
subgoals with only a small number of inferences. (See also Figure 2: Ovals are tableaux 
in a finite segment of the search tree T, the lines represent the h relation. Grey ovals 
represent enumerated tableaux, i.e. their subgoal clauses are stored, white ovals represent 
tableaux which are not enumerated.) Thus, the method is somewhat unintelligent because 
no information about the quality of the transformation of an original goal clause into a 
subgoal clause is used. Certain transformations are favored against others only due to the 
uninformed subgoal selection strategy. 

Our second variant, an adaptive method, tries to overcome these disadvantages in the 
following way: Instead of permitting more inferences when generating subgoal clauses due 
to an uninformed subgoal selection strategy, we want to allow more inferences at certain 
interesting positions of the search tree T for a given set of clauses C. 

In detail, our approach is as follows: At first, we generate all subgoal clauses S^"^'^^'^ 
or S^"p^^'^ with a resource ki which is smaller compared to the first variant. Then, a 
fixed number Nref of subgoal clauses is chosen which promise the highest gain of efficiency 
regarding the previously mentioned criteria. More exactly, we choose subgoal clauses which 
are maximal w.r.t. a selection function tp. One possible realization of tp is: 

t/)(5'x) = Oil ■ HSt) + «2 ■ max({7^(C) : C is a tableau clause in T}) 
+ as ■ max({sim(S'T, C):C eC, \C\ = 1}) 

The higher the number of inferences I{St) which are needed to infer St, the higher 
ipiSx) should be. Hence, ai should be positive. Setting 02 > is also sensible. If there 
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Figure 3: Adaptive generation of a set of subgoal clause candidates 

are tableau clauses in T which have a high heuristic weight regarding the heuristic % of 
the superposition-based prover we can — as already discussed — gain a lot of efficiency. The 
function sim measures whether literals from Sj can probably be solved with unit clauses 
from C. It maps a pair of clauses to a real number. The larger sim{ST,C) the larger 
the similarity between St and the unit clause C. We utilize a variant of the function 
occnest which is defined by Denzinger and Fuchs (1994) for accomplishing the task. We set 
ai > a2 > as > due to the increasing vagueness of the criteria. 

Now, let M^'-^f C ^-f"/'*^!'*^ or M^'-^f C S^lp''^''^ be the set of chosen subgoal clauses. 
Then, we generate subgoal clauses with a resource k2 but employ the clauses from M^''^f as 
start clauses for the subgoal clause enumeration. We call the set of subgoal clauses generated 
with this method gl'^JMfiM (Consider also Figure 3: The dotted line shows which 
subgoal clauses are generated with resource ki. Then some of them are selected (black ovals) 
and used as starting points for the generation of new subgoal clauses with resource k2.) The 
resource k2 should again not be too high in order to allow a fast enumeration of the subgoal 
clauses. The set of subgoal clause candidates is then given by S^"-^''^^'^ U ^W,fc2,c,M '-^f 
we employ CTC) or by Sj^^J/^'^ U 5W,fc2,c,M~-/ (jf employ CTC„eg)- Thus, subgoal 
clause candidates are on the one hand all subgoal clauses derived with a certain number ki 
of inferences such that it can be assumed that the proof length is reduced. On the other 
hand, we have some subgoal clause candidates which are derived with a higher number of 
inferences, at most ki + k2. These subgoal clauses promise a high gain of efficiency because 
they are derived from subgoal clauses selected with function ip. That is, they are derived 
from clauses which are considered to be very relevant for a superposition-based theorem 
prover. 

For selecting subgoal clauses from the set of subgoal clause candidates we employ func- 
tion Lp — which is mainly based on the function ip — and select clauses with the highest weight 
regarding Lp. ip is defined by 
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9 simply counts a weighted sum of the 
number of function or predicate symbols in 
preferred. This is sensible because they can 



number of variables in Sj and two times the 
Sj. Hence, quite "general" subgoal clauses are 
usually be solved more easily. 



4. The Use of Lemmas in Model Elimination 

In this section we examine theoretical and practical aspects concerning the integration of 
superposition-generated lemmas into the input set of a model elimination prover. At first, 
we present some results regarding proof length and search reductions. As before, we measure 
the proof length by the number of inferences in it. Proof search is measured by the number 
of inferences the prover must perform in order to find a proof. Then, we introduce some 
methods for a relevancy-based filtering of lemmas. 



4.1 Proof Length Reduction 

When adding positive unit lemmas Cbu of a bottom-up prover to the axiomatization C of a 
top-down prover, a proof length reduction is possible if the following situation occurs. If T 
is a tableau that represents a proof and sg is a literal which has a depth smaller than ra — 1 of 
a branch in T with depth n we can reduce the proof length if ^sg is unifiable with a lemma 
^ G Cbu- Hence, we are interested in the question of whether we can find lemmas useful in 
the described sense if we choose Cbu = VBu{{C : C is a fact, C G T^'"^^) as proposed in 
Section 2. Unfortunately, even if <~pbu selects all facts in {C : C is a fact,C G J^^''} and i 
is arbitrary large there is no guarantee that we can find a useful lemma in this set. This is 
even true if the bottom-up prover employs a fair heuristic. 

Theorem 4.1 For each i £ IN there is a clause set Ci and a fair heuristic such that 
no positive unit lemma from T^'^ (generated by a resolution prover starting with Ci and 
employing heuristic %{) can reduce the proof length of a proof for Ci with CTC (CTCneg)- 

Proof: Let i be a natural number and >= be the ordering used for ordered resolution. For a 
literal /, |/| denotes the number of symbols in /. Set Ci = {{p{a)}, {^p{x) , p{f (x))} , {-ig(a)}, 
{^q{b),q{a)},{q{b)}}. Let 7ii{{h, . . .,/«}) = J2]=i^f with Ij = Ij, if Ij is positive, and 
Ij = otherwise. Moreover, 



-Hfil) 




, I = p{t), i is a term 
, / = q{t), i is a term. 



Then it holds: For a fixed parameter i, 7ii is a fair heuristic. Moreover, there are only 
ME proofs of the inconsistency of Ci which contain literals with top-symbol q. T^'^ contains 
only literals with top-symbol p, though. Hence, it is impossible that a lemma of T^'^ is 
applicable. □ 

From a theoretical point of view we have again the negative result that in general useful 
lemmas are not elements of T^'^ . However, empirical studies (see Section 5) reveal that in 
the most cases useful lemmas are generated by a superposition prover. Hence, we assume 
that useful lemmas are in Cbu and henceforth examine which lemmas — being part of a 
proof — can lead to a high proof search reduction of an ME prover. 
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4.2 Proof Search Reduction 

The effects regarding the structure of the search space of a CTC {CTCneg) prover caused 
by the use of lemmas are closely related to the utility problem (e.g., Minton, 1990) from the 
area of explanation-based learning (EBL) and macro operator learning (see also Markovitch 
& Scott, 1993). At the first sight, lemma use could be interpreted as introducing new edges 
into the original search tree T because a sub-deduction (proof of a lemma) can be reduced 
to one inference by applying a lemma. This corresponds to macro operator learning or EBL 
where inference chains are generalized and disjunctively stored as new operators or concept 
descriptions (e.g., Minton, 1990), respectively. We should notice, however, that the use of 
lemmas does not only insert new edges but also new nodes into the search tree. This comes 
from the fact that the structure of a tableau Ti where a lemma is applied differs from the 
structure of an in other parts equal tableau T2 where the lemma proof is "expanded" . This 
has no infiuence on the inferences possible with Ti and T2 (the edges outgoing from the 
nodes vi and V2 that are labeled with Ti and T2, respectively) but it has an effect on the 
value a completeness bound B assigns to the tableaux. Considering the bounds introduced 
in Section 2, Ti can be enumerated with a resource value which is smaller than or equal 
to that needed to enumerate T2. In analogy to macro operator learning and considering 
these remarks, we now summarize the advantages and disadvantages of using lemmas in 
connection with iterative deepening procedures. 

A minor advantage of introducing a lemma is the advantage of decreasing path costs, i.e. 
the costs of reproducing the inferences needed for its proof. The major advantage of using 
lemmas is that they make a restructuring of the the search space possible. 

On the one hand, one can save the possibly high search effort needed for proving a useful 
lemma (assuming the lemma proof can be expanded within the finite segment of T to be 
considered). On the other hand, it is possible that a closed tableau can be reached within a 
smaller resource value ("resource reduction"). Then, the reordering effects usually allow us 
to solve problems that were previously out of reach because the search procedure gets lost 
in the (usually exponentially) larger segment of the search tree defined by a larger resource. 
It is clear, however, that this advantage only holds if the segment of the search tree defined 
by the lower resource value is not increased too much by the use of the lemmas. Considering 
our search bounds we can see that normally resource reductions cannot be guaranteed when 
using superposition generated lemmas in an ME prover. When using the inference bound 
a resource reduction is guaranteed if by using lemmas a proof length reduction can be 
obtained. In the case of the depth and weighted-depth bound in general not even a proof 
length reduction leads to a resource reduction. 

Besides the positive effects of using lemmas, some negative effects also occur. These 
stem from an increased redundancy. The main disadvantage regarding the use of lemmas 
is the increase of the branching rate of the search tree. It is possible that a misleading 
solution of a subgoal may be obtained that could not be found before within a given finite 
segment. Even if a resource reduction from n to n' < n occurs it is possible that solutions of 
subgoals that could not be found with resource n (without lemmas) can now be found with 
resource n' and lemmas. This can reorder the search space in a hardly controllable manner. 
Considering the inference bound in some cases tableaux which could not be enumerated 
with resource n can now be enumerated with lemmas. It is possible that the use of lemmas 
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"spares" more than n — n' inferences. If we use the depth bound it might be that in a 
tableau some branches which can be closed in a depth greater than n without lemmas can 
now be closed in a depth smaller than or equal to n' . Then a lot of superfluous inferences 
can be introduced to the new minimal proof segment. Analogous effects take place when 
using the weighted-depth bound (especially also when using the conflguration of the bound 
as described by Moser et al. (1997)). 

Additionally, duplications of segments of the search space can occur. Assuming the 
expanded lemma proof lies within the initial segment of T to be considered, the use of an 
irrelevant lemma can cause a repeated exploration of parts of the search space which does 
not contain a proof: Since a superfluous solution of a subgoal is found twice (via the lemma 
and by performing the inferences needed to prove the lemma) the resulting superfluous 
inferences have to be performed twice, too. This disadvantage, however, can usually be 
overcome by using local failure caching (Letz et al., 1994). 

Besides these effects, which cause a restructuring of the search space, it is even possible 
that the use of lemmas increases the number of solutions of certain subgoals that exists in 
the whole search tree. This is because the use of lemmas can shadow well-known pruning 
techniques like regularity since no regularity checks are possible in the proof of a lemma. 
Furthermore, the newly introduced lemmas cause the problem that in each inference a 
possibly large number of lemmas has to be tested in order to determine whether inferences 
are possible {applicability test). This necessitates new uniflcation attempts. 

In summation our lemma mechanism is in general not able to produce lemmas that 
lead to a proof length reduction and thus to a resource reduction. Nevertheless experience 
shows that in many cases reductions of the proof length and the needed resource can be 
obtained. When a small number of lemmas is sufficient for a resource reduction the number 
of inferences which can be spared by using lemmas can exceed the number of new superfluous 
inferences by magnitudes. Thus, mechanisms are needed in order to select relevant lemmas 
from {C : C is a fact,C G J-^'^} that should be inserted into Cbu- If we can flnd a rather 
small lemma set which permits a resource reduction then in almost all cases we can flnd a 
proof much faster than would be possible without using lemmas (see Section 5). Note that 
the case of a large reduction of the proof length without a reduction of the resource needed 
normally performs signiflcantly worse than the case of a small proof length reduction with 
a reduction of the resource needed. 

4.3 Relevancy-Based Lemma Selection 

Analogous to the foregoing section we now want to introduce some abstract principles for 
flltering lemmas based on the discussion regarding the structure of the search space. Then, 
we deal with concrete heuristics applied for selecting lemmas. 

4.3.1 Relevancy Criteria for Lemmas 

Since superposition provers employ a different search scheme than ME provers and since 
they have effective mechanisms for handling equality, we can assume that a few subgoals 
which are hard to solve (the proof necessitates a large resource w.r.t. a given completeness 
bound) with an ME prover can be solved with lemmas. However, when using lemmas in 
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order to close some subgoals of an open tableau the remaining open subgoals should be 
easy to solve w.r.t. the given bound. Otherwise, we still cannot solve the problem within a 
smaller resource. Note that we usually cannot expect that all branches of a proof can be 
"shortened" by superposition-generated lemmas. Since our lemma generation provides no 
guarantee that useful lemmas are generated (see Section 4.1) usually only a small number of 
lemmas can be employed in a proof. All in all we obtain that "interesting" proofs (i.e. those 
we want to find) for an application of lemmas are proofs that contain many subgoals that 
are easy to solve — and can hence be solved "conventionally" within a small resource — and 
only a few hard subgoals that must be solved with lemmas. Then, we can expect that using 
lemmas leads to resource reductions. Our filter techniques should hence try to find lemmas 
that might be part of such proofs. 

Furthermore, we should estimate how many new superfiuous inferences are introduced 
by a lemma. The integration of new lemmas must not increase the branching rate too much. 
Otherwise, the gain of a possible resource reduction is negated by the large overhead (see 
Section 4.2). 

These criteria lead us to three different filter functions that concentrate on certain 
aspects of relevancy. The filter functions are well-suited for all of the previously introduced 
completeness bounds. Due to the vagueness of the filter criteria we use each filter function 
in order to choose some lemmas (see Section 5). Note that it is better to select a few 
unnecessary lemmas than to omit the selection of important ones. 

4.3.2 Expansion/Contraction-Based Selection 

The first filter function is called ^gjj- This function is rather simple and aims at finding 
lemmas that do not lead to a high increase of the branching rate. <i^gu accomplishes this by 
using knowledge obtained in the lemma generation (preprocessing) phase of the bottom-up 
prover. In detail, <i^gu selects facts with the highest value regarding a judgment function 

Definition 4.1 (Judgment function '>pgij) 

For a positive unit /, generated in the preprocessing phase, let e(/) and k{1) be the numbers 
of expansion and contraction inferences, respectively, that / was involved in. Then 

V'lc/(/) = «(/)-£(/)• 



ip^u counts the inferences that each lemma candidate was involved in and rates ex- 
panding inferences negative, contracting inferences positive. If a fact / was often involved 
in an expanding inference like resolution or superposition, then / or many subterms of it 
are unifiable with (subterms of) (maximal) literals of axioms or derived clauses. Hence, if / 
is added to the axiomatization of the top-down prover it can be expected that / or certain 
descendants of it can very often take part in extension steps. Since this leads to a high 
increase of the branching rate we rate this negative. Of course, we are interested in the fact 
that a lemma can be applied by the ME prover. However, since lemmas of a superposition 
prover are usually quite general most of them may be used for closing occurring subgoals. 
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-g(b) = f(h(b)) 
^g(b) = f(f(b)) -f(f(b)) = f(h(b)) 



transitivity "=" 



symmetry "=' 



congruence "=" 



^ f(f(b)) = g(b) 



^ f(b) = h(b) 



axiom 



symmetry "=" 



f(f(b)) = g(b) 



-h(b) = f(b) 



axiom 



h(b) = f(b) 



Figure 4: Simulating a superposition step with ME 



Our criterion aims mainly at excluding lemmas that are applicable in too many cases and 
hence introduce too many solutions of subgoals that do not lead to a refutation of the in- 
put clauses. In contrast to expanding inferences we rate contraction inferences positively. 
Indeed, ME provers do not have contracting inferences. But as shown by Letz et al. (1992), 
subsumption can partly be simulated by subsumption constraints. Hence, clauses that are 
able to contract many other clauses can support search pruning techniques. 

4.3.3 Derivation-Based Selection 

The second filter function (pgjj tries to select facts that are able to close subgoals that are 
very hard to solve with a connection-tableau-based prover. In order to estimate this, we 
consider the derivation history of a fact. We employ this filter function only if equality is 
involved in a problem. 

Example 4.1 Let {f{f{x)) = g{x)} and {h{b) = f{b)} be axioms. Then, the clause 
{g{b) = f{h{b))} can be derived by one superposition step. However, if ~^g{b) = f{h{b)) is 
a subgoal of an ME proof, its proof is more complicated (see Figure 4). 

In general, if the superposition step is performed at a position p and |p| denotes the 
depth of the position (above |p| = 1), then at most |p| -|- 5 inferences are needed in order 
to prove the result of such a superposition step. The proof necessitates at most a depth of 



This example shows that the simulation of the specific equational operations of a su- 
perposition prover necessitates a high depth as well as inference resource in an ME prover. 
That is, lemmas derived by many superposition steps are able to close subgoals that cannot 
be solved by an ME prover within small resources. Hence, if such lemmas are applicable 
large resource reductions possibly occur for depth or inference oriented bounds. The judg- 
ment function tp^jj employs this criterion. Again, the filter function (fi^jj selects facts with 
the highest value regarding this judgment function. 



H + 3. 
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Definition 4.2 (Judgment function V'bc/) 

For a positive unit / generated in the preprocessing phase, let '>pgij{l) be defined by 

, / is an axiom 

, / is derived by a superposition step with premises 
li and I2 

, / is derived by a non-superposition inference 
involving the literals /i, . . . , /„ (ra G IN). 

4.3.4 Complexity-Based Selection 

Our third filter function (fgjj aims at selecting lemmas that are able to solve some hard 
subgoals of ME subgoal clauses such that the resulting open subgoals can easily be solved. 
Hence, the judgment function V'bc/ used by (fi^jj considers the sets of subgoal clauses S^'"^'^ 
or S^^g'^ for a certain resource n. For each subgoal clause sg, if a lemma / with tl^^'lf [1) > 
exists, the lemma / with the highest judgment V'b(7''(0 selected until a maximal number 
of lemmas is selected (see Section 5). This judgment is computed as follows. 

Definition 4.3 (Judgment function tl^^'lf) 

For a positive unit / generated in the preprocessing phase and a subgoal clause sg = 
{/i, . . . , Im}, let V'b(7''(0 defined in the following manner: 

If no subgoal can be solved with /, i.e. 1 < i < m, a : a = mgu{^li, /), then "fl^^'if [1) = 0. 
Otherwise, let sgjj = {ui, . . . , u^} C sg he a set of literals and ci be a substitution so that a 
is most general with: \fz, 1 < z < k : (t(~m^) = Moreover, under all subsets of sg and 
substitutions with this property, let the set sgjj and the substitution ci be a maximum of the 
function defined by G'^ {{ui, . . . ,Uk}, a) = ——^ ^ . ^ Then, the remammg 

literals of sg are sg^ = {ri, . . . , rj} = sg\ sgjj. Let 7 be a complexity function, i.e. 7 maps 
literals to [0; 1] and high values of 7 indicate that the respective literal (subgoal) appears 
to be solvable. Then, 

resgn uesgu 

We can recognize that tl^^'lf [1) really rates / with a high value if many hard subgoals 
(w.r.t. 7) of sg can be solved with /. Moreover, / is rated with a high value if there are only 
a few subgoals in the subgoal clause sg that cannot be solved by lemmas and that appear 
to be solvable rather easily (w.r.t. 7). In our realization, 7 considers subgoals to be solvable 
that are small, have a rather fiat term structure, and many variables in comparison with the 
term size. In future we will further refine tl^^'lf by explicitly considering the completeness 
bound which is used for the top-down proof search. 

1. sgu is a rather large set of subgoals that can be solved with I such that not too many symbols are 
introduced by the unifier. This is sensible because otherwise the possibility that the remaining subgoals 
can be solved is decreased too much. 
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5. Experimental Study 

In order to conduct an experimental evaluation of our integration of top-down/bottom-up 
provers, we coupled two renowned provers: the ME prover Setheo and the superposition 
prover Spass. We have used the version of Setheo as described by Moser et al. (1997). 
Spass has been employed in version 0.55. 

5.1 Architecture and Behavior of the Experimental System 

Our experimental environment can be described as follows: Each prover runs on its own 
processor and obtains the initial clause set C as input. We employed a rather efficient 
method to organize the preprocessing. Essentially, the top-down prover generates subgoal 
clauses with one of the two variants. In our environment this does not require changes in the 
top-down prover but can be performed with built-ins of the PROLOG-style input language 
of Setheo. Since Setheo employs CTCneg we experimented only with subgoal clauses 
obtained with negative start clauses. Then, these subgoal clauses are filtered, transferred to 
the bottom-up prover, and integrated into its search state. The preprocessing of the bottom- 
up prover is performed in parallel with the preprocessing of the top-down prover. The prover 
activates clauses with its basic heuristic until the top-down prover finishes its preprocessing. 
Thus, we achieve synchronization of the provers. After that, we extract the positive units 
from the set of active facts of Spass and filter some facts as described. Since we can employ 
the generated subgoal clauses of Setheo for the filter function the generated subgoal 
clauses can be used as additional input of Spass as well as for the selection of lemmas for 
Setheo. Finally, the provers proceed to tackle the problem in parallel with their standard 
settings. By using this environment we can achieve cooperation by exchanging lemmas 
and subgoal clauses without one concept disturbing the other. In contrast, both concepts 
support each other because results obtained from one preprocessing can be employed in the 
other. 

We experimented in the light of problems stemming from the well-known problem library 
TPTP V. 1.2.1 (Sutcliffe et al., 1994; Sutcliffe k Suttner, 1998). In order to obtain a reliable 
collection of data, we employed all domains contained in the TPTP as our test set. Because 
these domains cover a wide range of very different problems we assume that this is a reliable 
test set. 

Since the TPTP contains too many problems to list and discuss the runtimes of single 
problems, we will present an overview of the number of solved problems in the TPTP 
library. Furthermore, we study in which domains cooperation is especially important and 
deal with the main features responsible for this. In addition we study the results in three 
domains in more detail to give an impression for the decrease in run time. This concerns 
the domains CAT (category theory), LDA (LD-algebras), and COL (combinatory logic). 
The problems in the domains CAT and LDA contain equality as well as non-Horn clauses. 
COL is a Horn-equality domain. 

In detail, the parameters of our experimental system are: The subgoal clause candidates 
were generated in such a way that for variant 1 we employed the resource A; = 10 which 
performed best in the experiments. The use of higher resources did not yield better results. 
We limited the set of subgoal clauses by N^g = 500. For variant 2 we employed ki = k2 = 9 
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as resources. As start clauses for an adaptive refinement we selected N^ej = 5 subgoal 
clauses. These parameters allowed the efficient generation of all subgoal clauses within 
the initial segments of the search tree. Usually with this method at most 500 subgoal 
clauses were generated, i.e. about the same number as when employing variant 1. For the 
selection of subgoal clauses that are to be transmitted to Spass we used domain-dependent 
parameters. For CAT, COL, and LDA we used 100 clauses. In the other domains in 
preliminary experiments the use of 30 clauses achieved the best results. 

The bottom-up lemmas were selected via the functions ^gjj, fgij, and (fgjj. We selected 
with each of the functions a maximum of 10 clauses. 

The setting of Setheo was automatically chosen as described by Moser et al. (1997). 
The Spass standard heuristic essentially selects clauses of the smallest size. Periodically, 
clauses are selected with breadth-first search. 

5.2 Experimental Results 

In the following we compare the results of our cooperative prover with the single provers. 
This comparison is performed regarding the whole TFTP library. After that, we analyze 
runtimes in few selected domains of TFTP. 

5.2.1 Comparison of different variants in the TFTP 

Table 1 presents results of our experiments. It shows the number of solved hard problems 
in certain domains of TFTP. Solved means that a proof could be found within 300 seconds. 
We consider a problem to be hard if neither Spass nor Setheo are able to solve it within 
10 seconds. The table only presents the results of such domains where hard problems exist 
and where at least one hard problem could be solved by any of the considered variants. 
Note that the table cannot give hints on the power of the single provers. This is because it 
does not give the complete number of problems which can be solved by each single prover in 
the whole domain. Since many non-hard problems are in the TFTP this number is usually 
much higher than the number of solved hard problems. Nevertheless, the table is sufficient 
for analyzing the performance of our cooperative system since only the hard problems are 
interesting for studying the potential of cooperation. 

Column 1 of the table displays the name of the domain. Columns 2 and 3 present the 
number of solved problems of Spass and Setheo (on a SPARCstation-20/712) when work- 
ing alone. Column 4 shows the number of solved problems of Spass when it obtains subgoal 
clauses from Setheo which are generated regarding variant 2. This variant performs better 
than variant 1 (see also the following subsection). Column 5 displays the number of solved 
problems of Setheo when it obtains bottom-up generated lemmas from Spass. In that 
case we always employed variant 2 for generating subgoal clauses (recall that the selection 
of lemmas depends on the way how subgoal clauses are generated). Column 6 gives the 
number of solved problems of a competitive version of Spass and Setheo in order to show 
that our cooperative prover is indeed much more powerful than a simple competitive parallel 
prover. Finally, in column 7 we can find the number of solved problems of our cooperative 
system. 
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Table 1: Integration of top-down/bottom-up approaches by cooperative provers: solved 
hard problems 



The results reveal the high potential of our approach to significantly improve on single 
provers. Spass is only able to solve 55.9% of the problems which can be solved by coop- 
eration, Setheo can only solve 36.5%. Competition of provers is very successful because 
of the very different behavior of the provers. But even a competitive prover consisting of 
Spass and Setheo can only solve 81.5% of the problems solvable by cooperation. Hence, 
cooperation is really important in order to increase the success rate. When integrating sub- 
goal clauses into Spass its solvability rate is increased by 22.9%. In the most cases subgoal 
clauses take part in the search process and can help to reorder the search in a favorable 
manner. The use of lemmas increases Setheo's performance by 33.8%. The increase of 
the solvability rate of Setheo is really due to occurring resource reductions. In almost all 
cases where a substantial speed-up is obtained we could find a proof with a smaller resource. 
Then, the lemmas are used as expected, i.e. they are able to close subgoals that occur after 
few inferences and whose ME proof would require many inferences. 

When taking a closer look at the results we can recognize the following. A prover 
which already shows a rather satisfactory behavior in a specific domain can often profit 
from others. Cooperation can entail that other hard problems can additionally be solved. 
However, if a prover is not suitable for a certain domain then cooperation will normally not 
result in a significant increase of its performance. Because of the fact that Setheo and 
Spass show a very different behavior in the most cases at least one prover can be improved 
in a certain domain. 
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It is interesting to find out whether certain characteristics of problems lead to a high 
or low performance of the cooperative system. We examine whether the characteristics 
"a domain contains equality" and "a domain contains non-Horn problems" influence the 
performance. First, we should note that these characteristics do not completely determine 
the performance of the cooperative system. There are gains of efficiency for all kinds of 
problem, regardless of the type of clauses occurring in the problems. But we can at least 
observe some tendencies. 

Firstly, we can observe that the cooperation approach is especially well-suited for prob- 
lems containing equality. The best results are obtained in the domains CAT, GRP, and 
SET which contain many problems with equality. When analyzing proof runs we can flnd 
two reasons for this. In such domains Spass is able to support Setheo because it has much 
stronger inferences for handling equality than Setheo. Spass can often derive "difficult" 
lemmas with few inferences, i.e. lemmas whose derivation would require many inferences 
by Setheo. Setheo can support Spass because it is able to make transformations of the 
proof goal that Spass cannot perform because of its flxed ordering used for superposition. 
This can increase the flexibility of the proof search performed by Spass. 

Secondly, we consider whether the fact that a domain contains mostly Horn or non-Horn 
problems influences the performance of the cooperation approach. Considering the domains 
where the cooperation approach could successfully be applied we can notice that successes 
could be obtained for Horn (e.g., COL) as well as non-Horn domains (e.g., SET). In the 
domains where no hard problems could be solved (neither sequentially nor with cooperation) 
often the percentage of non-Horn clauses is rather high (note that these domains do not 
appear in the table). The main reason for this, however, appears to be that the single 
provers show a weak performance in these domains. A strong relationship between the 
performance of the cooperative prover and the fact whether a problem is Horn or non-Horn 
could not really be found in the experiments. 

5.2.2 Analysis of runtimes in selected domains 

Up to now we only considered the number of solved problems. In addition, it is interesting 
to analyze whether the use of subgoal clauses or lemmas can speed-up the proof search in 
general, i.e. also for problems that can be solved by single provers. Short run times are 
especially important if theorem provers are used within interactive prover environments. 
We restrict ourselves to the three domains CAT, COL, and LDA and are going to analyze 
the runtimes in more detail. 

Table 2 presents the runtimes when tackling hard problems of the three test domains. We 
omitted all problems that could neither be solved by a single prover when working alone, nor 
by any of the cooperation variants. Column 1 of the table displays the name of the problem. 
Columns 2 and 3 present the runtimes of Spass and Setheo (on a SPARCstation-20/712) 
when working alone, columns 4 and 5 the runtimes of Spass when it obtains subgoal clauses 
from Setheo which are generated regarding variants 1 and 2, respectively. Note that the 
runtimes include the generation and selection time of subgoal clauses, and the transmission 
to Spass. Column 6 displays the runtime of Setheo if it obtains bottom-up generated 
lemmas from Spass. In that case we always employed variant 2 for generating subgoal 
clauses. Also these runtimes include the preprocessing of Spass and the transmission and 
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Table 2: Integration of top-down/bottom-up approaches by cooperative provers: runtimes 
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integration of the lemmas. Column 7 gives the runtime of a competitive version of Spass 
and Setheo (minimum of the runtimes of columns 2 and 3). Finally, in column 8 we can 
find the runtime of our cooperative system (minimum of the runtimes of columns 5 and 6). 
The entry "-" means that the problem could not be solved within 1000 seconds. 

Since all domains contain equality the results are better than the results over the whole 
TPTP. Our cooperative prover can solve all listed problems, whereas Spass is only able to 
solve 32.5%, Setheo only 40.0%. A competitive prover consisting of Spass and Setheo 
can merely solve 62.5% of the problems. Not only the success rate but also the runtimes 
are clearly improved when using a cooperative prover. The runtimes are often decreased by 
substantial factors (in spite of the fact that running the two provers in parallel consumes 
twice as much total CPU time). 

When studying the runtimes and proofs obtained by Setheo we can observe the follow- 
ing. If speed-ups of Setheo are really due to occurring resource reductions, in almost all 
cases a substantial speed-up is obtained. Sometimes — for instance in the COL domain — we 
have the situation where no resource reduction takes place but reordering effects allow find- 
ing proofs faster. In this situation, the speed-ups are low. Let us take a closer look at the 
runtimes of Spass when using subgoal clauses. When considering the results of variant 1, 
the results show that a naive and uninformed generation of subgoal clauses usually does not 
entail much gain. So, only 40% of the problems can be solved using this variant. Variant 2, 
however, shows quite a satisfactory behavior. Hence, an intelligent generation of a subgoal 
clause pool really does strongly infiuence the efficiency. 

6. Discussion 

Integration of top-down and bottom-up provers by employing cooperation is very promising 
in the field of automated deduction. Due to certain strengths and weaknesses of provers 
following different paradigms, techniques that try to combine the strengths by cooperation 
can allow an improvement of the deductive system. Our approach of combining top-down 
and bottom-up provers by processing top-down generated subgoal clauses in a bottom-up 
prover achieves this combination by introducing goal-orientation into a bottom-up prover 
thus combining strong redundancy control mechanisms and goal-directed search. The use 
of bottom-up generated lemmas in a top-down prover can contribute to significantly reduce 
proof lengths such that proofs can be found with smaller resources. 

Related approaches for supporting top-down by bottom-up inference also mainly aimed 
at employing bottom-up created lemmas in a top-down prover. Similar to our method by 
Schumann (1994) and Fuchs (1998a, 1999) lemmas are created in a preprocessing phase and 
the input clauses are augmented by these formulas. The main difference of these approaches 
and our approach is the kind of the used lemmas. There, the ME inference mechanism is 
used in order to generate lemmas. This has the advantage that in some cases — in contrast 
to our technique — proof length or resource reductions are guaranteed. However, the lemma 
mechanisms used by Fuchs (1998b, 1998a, 1999) generate quite "easy" lemmas. Hence, their 
potential w.r.t. the size of the resource reduction is limited. 

Other approaches try to dynamically create unit lemmas during the proof run of the 
ME prover (Astrachan & Stickel, 1992; Iwanuma, 1997; Astrachan & Loveland, 1997). 
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After each successful solution of a subgoal a lemma might be generated and added to the 
input clauses. The aim of this kind of lemma generation is to produce lemmas that are 
able to reduce the search amount by eliminating repeated sub-deductions. One criticism 
regarding this kind of lemma generation is the fact that it is unclear whether or not useful 
lemmas can be generated. There is no guarantee that lemmas can be produced during 
the proof run which can contribute to a proof, i.e. which can be "re-used". Furthermore, 
as already mentioned, the generated lemmas are usually not as general as possible due to 
instantiations coming from the solutions of subgoals previously solved. This can reduce 
the applicability of a lemma although the "generalized" proof could be re-used for refuting 
the input clauses. Thus, although some hard problems could only be solved with such 
lemma techniques (see Astrachan & Loveland, 1997), no stable success has been reported 
over a large set of problems. The main disadvantages of all approaches which only aim 
at supporting top-down provers originate from the fact that in some domains, especially if 
equality is involved, superposition-based provers clearly outperform ME provers. Thus, in 
such domains it may be more sensible to develop techniques in order to support the more 
powerful bottom-up prover than the weaker top-down prover. 

In order to improve bottom-up proof search by using top-down performed inferences the 
following approaches have been employed. Firstly, again one prover (the bottom-up prover) 
is assisted by clauses derived from another prover (the top-down prover). The approach 
from Sutcliffe (1992) uses lemmas generated by a guided linear deduction system (and not 
subgoal clauses) in order to support resolution-based provers. Due to the lack of goal ori- 
entation (as described in Section 2.2) this method could not yield convincing results in 
practice. Secondly, there are approaches to make bottom-up provers more goal-directed by 
forcing them to work only with some relevant clauses which are detected by top-down cal- 
culations. The methods described by Bancilhon et al. (1986), Stickel (1994), and Hasegawa 
et al. (1997) transform a set of clauses into another clause set which is then evaluated in a 
bottom-up manner. The specific transformation provides a combination of top-down and 
bottom-up processing and prunes the bottom-up evaluation to relevant clauses (which bear 
a connection to a proof goal). Thus, obviously the bottom-up proof search becomes more 
goal oriented. Also the method described by Loveland et al. (1995) provides a relevancy 
testing for bottom-up calculations. Based on top-down proof attempts the relevancy of 
a clause is dynamically determined during the bottom-up calculation. In contrast to our 
method in these approaches the bottom-up prover has to tackle the whole proof task. Our 
approach for using top-down generated subgoal clauses in a bottom-up prover does not 
provide a relevancy testing of bottom-up inference but supports the bottom-up inference 
process by simplifying the original goal. Thus, the proof length may be shortened. Further- 
more, parts of the search space of the bottom-up prover, which contain relevant clauses but 
may be difficult to enumerate, can be traversed by a single inference which provides large 
search reductions. 
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